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Given the scarcity of data pertaining to whole-genome sequences of cyanobacterial strains isolated in Brazil, we hereby present 
the draft genome sequence of the Cyanobium sp. strain CACIAM 14, isolated in southeastern Amazonia. 
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Given their ability to synthesize a wide variety of biologically 
active products (1-3), the cyanobacteria have played a rele- 
vant role in modern biotechnology. Their full biotechnological 
potential (4, 5) might be more easily exploited with knowledge of 
genetic content. We have reconstructed the genome of a cyano- 
bacterial species by means of a bioinformatics pipeline applied to 
reads obtained from a non-axenic culture of a Cyanobium sp. Data 
are scarce pertaining to genomes of cyanobacteria isolated in Bra- 
zil, with currently only a few strains sequenced (6, 7). 

We have hereby applied a next-generation sequencing pipeline 
according to Albertsen et al. (8) in order to obtain the genomic 
data pertaining to the Cyanobium sp. strain CACIAM 14, a uni- 
cellular cyanobacterium which was isolated from a water sample 
collected in December 2010 in the Tucurui Hydroelectric Dam 
(3°49'55"S, 49°38'50"W) in the State of Para, Brazil. 

Two genomic DNA samples cultured 6 months apart were ob- 
tained from a non-axenic cyanobacterial biomass. The two non- 
paired libraries were sequenced using the GS FLX 454 platform 
(Roche Life Science), yielding 660,228 (-255 Gb) and 815,325 
(-357 Gb) reads for the first and second runs, respectively. 

The datasets were assembled separately with Newbler 2.6 (min- 
imum read size, 45 bp; minimum overlap, 40 bp; minimum over- 
lap identity, 90%). These assemblies generated 3,654 and 3,256 
contigs larger than 1 kb in length, with N50 values of 2,149 bp and 
37,998 bp. 

The contigs were identified and separated using a metag- 
enomic assembling pipeline (8) for each putative organism, i.e., 
the isolated cyanobacterium and its associated heterotrophic bac- 
terium (9, 10). The assembled contigs from the second run were 
used to determine the genome coverage. 

Our analyses permitted the recovery of a draft genome which 
contains 71 contigs (total of -3.2 Mb), ranging from 5,275 to 
403,256 bp. The average coverage was 25 X, with an assembly N50 
of 61,937 bp and GC content of 68.56%. 

The pipeline (8) used applied 107 hidden Markov models for 
essential genes present in a single copy in 95% of all bacteria. The 



genome draft sequence we present contains 108 of these genes, 
including duplications of the TIGR00436 and TIGR02350 genes. 
There is usually duplication of TIGR00436 only. 

Structural annotation was carried out with the Prokaryotic Ge- 
nome Annotation Pipeline (PGAP) tool, available on the NCBI 
website (11), resulting in 2,935 annotated coding sequences 
(CDSs) and 40 tRNA genes. The rRNA clusters were predicted by 
the RNAmmer tool (12). 16S rRNA was, however, found to be 
missing from this version of the draft sequence. Taxonomical 
identification was thus carried out based on the polymorphisms of 
the alpha and beta subunits of phycocyanin (according to 
DaU'Agnol et al. [13] ), which presented 93% and 88% nucleotide 
identity, respectively, to those of Cyanobium gracile PCC 6307. 

A preliminary analysis of the draft genome sequence using the 
antiSMASH tool (14) revealed the presence of 3 terpene clusters 
and 4 bacteriocin synthesis clusters. 

Nucleotide sequence accession numbers. This whole-genome 
shotgun project has been deposited at DDBJ/EMBL/GenBank un- 
der the accession number JMRPOOOOOOOO. The version described 
in this paper is version JMRPOIOOOOOO. 
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